Welcome to today’s Coffee & Coding session. This is a follow-up session after our Data Byte series. Today, we will explore how to work with APIs in R using the NHS Open Data Portal as our example. APIs (Application Programming Interfaces) allow us to access data from various web services, making them a powerful tool for data analysis and application development. Additionally, we will cover how to create your own API in R using plumber.
An API is a communication layer that enables different systems to interact without needing to know the details of each other. This session focuses on web APIs.
# install.packages("httr2")
# install.packages("jsonlite")
# install.packages("dplyr")
# install.packages("data.table")
# install.packages("devtools")
# install.packages("plumber")
# install.packages("stringr")
library(httr2)
library(jsonlite)
library(dplyr)
library(data.table)
library(devtools)
library(plumber)
library(stringr)
library(plotly)
httr2 is an R package designed for making HTTP requests
and handling responses. - Simplified syntax for constructing HTTP
requests - Support for common HTTP methods (GET, POST, PUT, DELETE) -
Tools for handling authentication, including OAuth. - Functions for
managing request headers, query parameters, and body content. -
Capabilities for processing and parsing HTTP responses.
jsonlite is an R packages that working with JSON data. -
Functions for parsing JSON string into R objects (fromJSON)
- Functions for converting R objects into JSON string
(toJSON) - Support for complex data structures, including
nested lists and data frames - Options for pretty-printing and
compacting JSON output - Compatibility with streaming JSON data for
larlge datasets
plumber enables the creating of RESTful APIs from R
scripts. It allows users to expose R functions as web services, making
it easy to integrate R with web applications. - Simple annotations to
define API endpointss - Support for HTML methods (GET, POST etc) - Tools
for managing API documentation and testing endpoints.
An endpoint is a specific URL that allows you to access a particular part of an API. For example, an API might have different endpoints for retrieving data, submitting data, or updating data. The structure of an endpoint typically includes the base URL and additional parameters or paths that specify the exact resource you want to interact with. It connects API clients and servers.
The Open Data Portal (ODP) uses CKAN (Comprehensive Knowledge Archive Network), an open-source data portal designed for the storage and distribution of open data. To learn how to interact with the NBSBSA ODP site, you can refer to the CKAN API guide available here.
You can find useful example code in here
# Define the base URL and endpoint
# To get a full JSON representation of a resource
base_endpoint <- "https://opendata.nhsbsa.net/api/3/action/"
package_show_method <- "package_show?id="
dataset_id <- "english-prescribing-data-epd"
# Combine the base URL and endpoint
url <- paste0(base_endpoint, package_show_method, dataset_id)
print(url) # We can check this url
## [1] "https://opendata.nhsbsa.net/api/3/action/package_show?id=english-prescribing-data-epd"
# Make the GET request using httr2
# Sends the request to the API and stores the response.
response <- request(url) |>
req_perform()
if (resp_status(response) == 200) { # or response$status_code
# Parse the content of the response
content <- resp_body_string(response)
data <- fromJSON(content) # We can check data
# Save this as data frame
resource_table <- data$result$resources
# tail(resource_table)
} else {
stop("API request failed with status: ", resp_status(response))
}
content refers to the body of the response from the
API.resp_body_json directly parse raw byte data into JSON
object. It stores in the data variable.
From data$size, we can see that each English Prescribing
Dataset (EPD) is about 6GB. We can construct SQL queries within the API
to download smaller, more focused datasets by selecting only the columns
of interest. By utilising the datastore_search_sql?
endpoint and specifying the resource_id, we can run SQL
queries to extract specific data.
For example, to extract data from the April 2024 data set, specially focusing on BNF chapter 5.1 (Antibacterial drugs) at the Integrated Care Board (ICB) level, and create a bar chart for comparison, we can use the following SQL query.
sql_action_method = "datastore_search_sql?"
# Pull the latest month antibiotic drug prescription data by ICB.
resource_id = "EPD_202404"
# Build our query which will feed after sql_action_method
tmp_query <- paste0(
"
SELECT
YEAR_MONTH,
REGIONAL_OFFICE_NAME,
ICB_NAME,
BNF_CHAPTER_PLUS_CODE,
COUNT(ITEMS) as ITEM_COUNT
FROM `",
resource_id, "`
WHERE
BNF_CHEMICAL_SUBSTANCE LIKE '0501%'
AND ICB_NAME != 'UNIDENTIFIED'
GROUP BY
YEAR_MONTH,
REGIONAL_OFFICE_NAME,
BNF_CHAPTER_PLUS_CODE,
ICB_NAME
"
)
# Build temporary API call
tmp_api_call <- paste0(
base_endpoint,
sql_action_method,
"resource_id=",
resource_id,
"&",
"sql=",
URLencode(tmp_query) # Encode spaces in the url
)
print(tmp_api_call)
## [1] "https://opendata.nhsbsa.net/api/3/action/datastore_search_sql?resource_id=EPD_202404&sql=%0A%20%20SELECT%0A%20%20%20%20%20%20YEAR_MONTH,%0A%20%20%20%20%20%20REGIONAL_OFFICE_NAME,%0A%20%20%20%20%20%20ICB_NAME,%0A%20%20%20%20%20%20BNF_CHAPTER_PLUS_CODE,%0A%20%20%20%20%20%20COUNT(ITEMS)%20as%20ITEM_COUNT%0A%20%20FROM%20%60EPD_202404%60%0A%20%20WHERE%0A%20%20%20%20BNF_CHEMICAL_SUBSTANCE%20LIKE%20'0501%25'%0A%20%20AND%20ICB_NAME%20!=%20'UNIDENTIFIED'%20%20%0A%20%20GROUP%20BY%0A%20%20%20%20YEAR_MONTH,%0A%20%20%20%20REGIONAL_OFFICE_NAME,%0A%20%20%20%20BNF_CHAPTER_PLUS_CODE,%0A%20%20%20%20ICB_NAME%0A%20%20"
# We can check in browser
# Grab the response JSON as a temporary list
tmp_response <- jsonlite::fromJSON(tmp_api_call)
# Extract records in the response to a temporary dataframe
antibiotic_icb_df <- data.table::as.data.table(tmp_response$result$result$records)
# Plot this
# Create the Plotly bar chart
fig <- plot_ly(data = antibiotic_icb_df,
x = ~ICB_NAME,
y = ~ITEM_COUNT,
type = 'bar',
text = ~paste(REGIONAL_OFFICE_NAME, "<br>Items:", ITEM_COUNT),
marker = list(color = 'rgba(50, 171, 96, 0.6)',
line = list(color = 'rgba(50, 171, 96, 1.0)',
width = 1.5))) |>
layout(title = "ICBs with antibacterial drugs items",
xaxis = list(title = "ICB Name", tickangle = -45),
yaxis = list(title = "Item Count"),
margin = list(b = 150),
hovermode = "closest")
# Display the chart
fig
URLencode is a function in R that is used to encode
characters in a URL so that they can be transmitted over the internet.
This function ensures that special characters in URLs are properly
encoded to avoid issues with URL parsing and transmission.
For example: - A space character () will be encoded as
%20. - An apostrophe (') will be encoded as
%27. - The greater than or equal to sign
(>=) will be encoded as %3E%3D etc.
URLencode is used to encode the SQL query string before
appending it to the URL for the API call. This ensures that any special
characters in the SQL query are properly encoded.
Let’s expand our query to loop through multiple months of data. We will extract monthly dispensed Amoxicillin (0501013B0) items in one random GP practice in North East & Yorkshire.
# extract multiple months for GP practices in North East and Yorkshire region
sql_action_method = "datastore_search_sql?"
# start_month & end_month define
start_month = "EPD_202305"
end_month = "EPD_202404"
# Filter within specific range from resources_table
resource_ids = resource_table |>
filter(name >= start_month & name<= end_month) |>
pull(name)
# check
print(resource_ids)
## [1] "EPD_202305" "EPD_202306" "EPD_202307" "EPD_202308" "EPD_202309"
## [6] "EPD_202310" "EPD_202311" "EPD_202312" "EPD_202401" "EPD_202402"
## [11] "EPD_202403" "EPD_202404"
lane_end_df <- data.frame()
for (month in resource_ids){
tmp_query <- paste0(
"
SELECT
YEAR_MONTH,
ICB_NAME,
PRACTICE_NAME,
BNF_CHAPTER_PLUS_CODE,
CHEMICAL_SUBSTANCE_BNF_DESCR,
COUNT(ITEMS) as ITEM_COUNT
FROM `",
month, "`
WHERE
BNF_CHEMICAL_SUBSTANCE LIKE '0501013B%'
AND PRACTICE_NAME LIKE 'LANE END SURGERY'
GROUP BY
YEAR_MONTH,
ICB_NAME,
BNF_CHAPTER_PLUS_CODE,
CHEMICAL_SUBSTANCE_BNF_DESCR,
PRACTICE_NAME
"
)
# Build temporary API call
tmp_api_call <- paste0(
base_endpoint,
sql_action_method,
"resource_id=",
month,
"&",
"sql=",
URLencode(tmp_query) # Encode spaces in the url
)
print(tmp_api_call)
tmp_response <- jsonlite::fromJSON(tmp_api_call)
# Extract records in the response to a temporary dataframe
tmp_df <- data.table::as.data.table(tmp_response$result$result$records) # we can look at our JSON structure
# Bind the temporary data to the main dataframe
lane_end_df <- dplyr::bind_rows(lane_end_df, tmp_df)
}
## [1] "https://opendata.nhsbsa.net/api/3/action/datastore_search_sql?resource_id=EPD_202305&sql=%0A%20%20SELECT%0A%20%20%20%20%20%20YEAR_MONTH,%0A%20%20%20%20%20%20ICB_NAME,%0A%20%20%20%20%20%20PRACTICE_NAME,%0A%20%20%20%20%20%20BNF_CHAPTER_PLUS_CODE,%0A%20%20%20%20%20%20CHEMICAL_SUBSTANCE_BNF_DESCR,%0A%20%20%20%20%20%20COUNT(ITEMS)%20as%20ITEM_COUNT%0A%20%20FROM%20%60EPD_202305%60%0A%20%20WHERE%0A%20%20%20%20BNF_CHEMICAL_SUBSTANCE%20LIKE%20'0501013B%25'%0A%20%20AND%20PRACTICE_NAME%20LIKE%20'LANE%20END%20SURGERY'%0A%20%20GROUP%20BY%0A%20%20%20%20YEAR_MONTH,%0A%20%20%20%20ICB_NAME,%0A%20%20%20%20BNF_CHAPTER_PLUS_CODE,%0A%20%20%20%20CHEMICAL_SUBSTANCE_BNF_DESCR,%0A%20%20%20%20PRACTICE_NAME%0A%20%20"
## [1] "https://opendata.nhsbsa.net/api/3/action/datastore_search_sql?resource_id=EPD_202306&sql=%0A%20%20SELECT%0A%20%20%20%20%20%20YEAR_MONTH,%0A%20%20%20%20%20%20ICB_NAME,%0A%20%20%20%20%20%20PRACTICE_NAME,%0A%20%20%20%20%20%20BNF_CHAPTER_PLUS_CODE,%0A%20%20%20%20%20%20CHEMICAL_SUBSTANCE_BNF_DESCR,%0A%20%20%20%20%20%20COUNT(ITEMS)%20as%20ITEM_COUNT%0A%20%20FROM%20%60EPD_202306%60%0A%20%20WHERE%0A%20%20%20%20BNF_CHEMICAL_SUBSTANCE%20LIKE%20'0501013B%25'%0A%20%20AND%20PRACTICE_NAME%20LIKE%20'LANE%20END%20SURGERY'%0A%20%20GROUP%20BY%0A%20%20%20%20YEAR_MONTH,%0A%20%20%20%20ICB_NAME,%0A%20%20%20%20BNF_CHAPTER_PLUS_CODE,%0A%20%20%20%20CHEMICAL_SUBSTANCE_BNF_DESCR,%0A%20%20%20%20PRACTICE_NAME%0A%20%20"
## [1] "https://opendata.nhsbsa.net/api/3/action/datastore_search_sql?resource_id=EPD_202307&sql=%0A%20%20SELECT%0A%20%20%20%20%20%20YEAR_MONTH,%0A%20%20%20%20%20%20ICB_NAME,%0A%20%20%20%20%20%20PRACTICE_NAME,%0A%20%20%20%20%20%20BNF_CHAPTER_PLUS_CODE,%0A%20%20%20%20%20%20CHEMICAL_SUBSTANCE_BNF_DESCR,%0A%20%20%20%20%20%20COUNT(ITEMS)%20as%20ITEM_COUNT%0A%20%20FROM%20%60EPD_202307%60%0A%20%20WHERE%0A%20%20%20%20BNF_CHEMICAL_SUBSTANCE%20LIKE%20'0501013B%25'%0A%20%20AND%20PRACTICE_NAME%20LIKE%20'LANE%20END%20SURGERY'%0A%20%20GROUP%20BY%0A%20%20%20%20YEAR_MONTH,%0A%20%20%20%20ICB_NAME,%0A%20%20%20%20BNF_CHAPTER_PLUS_CODE,%0A%20%20%20%20CHEMICAL_SUBSTANCE_BNF_DESCR,%0A%20%20%20%20PRACTICE_NAME%0A%20%20"
## [1] "https://opendata.nhsbsa.net/api/3/action/datastore_search_sql?resource_id=EPD_202308&sql=%0A%20%20SELECT%0A%20%20%20%20%20%20YEAR_MONTH,%0A%20%20%20%20%20%20ICB_NAME,%0A%20%20%20%20%20%20PRACTICE_NAME,%0A%20%20%20%20%20%20BNF_CHAPTER_PLUS_CODE,%0A%20%20%20%20%20%20CHEMICAL_SUBSTANCE_BNF_DESCR,%0A%20%20%20%20%20%20COUNT(ITEMS)%20as%20ITEM_COUNT%0A%20%20FROM%20%60EPD_202308%60%0A%20%20WHERE%0A%20%20%20%20BNF_CHEMICAL_SUBSTANCE%20LIKE%20'0501013B%25'%0A%20%20AND%20PRACTICE_NAME%20LIKE%20'LANE%20END%20SURGERY'%0A%20%20GROUP%20BY%0A%20%20%20%20YEAR_MONTH,%0A%20%20%20%20ICB_NAME,%0A%20%20%20%20BNF_CHAPTER_PLUS_CODE,%0A%20%20%20%20CHEMICAL_SUBSTANCE_BNF_DESCR,%0A%20%20%20%20PRACTICE_NAME%0A%20%20"
## [1] "https://opendata.nhsbsa.net/api/3/action/datastore_search_sql?resource_id=EPD_202309&sql=%0A%20%20SELECT%0A%20%20%20%20%20%20YEAR_MONTH,%0A%20%20%20%20%20%20ICB_NAME,%0A%20%20%20%20%20%20PRACTICE_NAME,%0A%20%20%20%20%20%20BNF_CHAPTER_PLUS_CODE,%0A%20%20%20%20%20%20CHEMICAL_SUBSTANCE_BNF_DESCR,%0A%20%20%20%20%20%20COUNT(ITEMS)%20as%20ITEM_COUNT%0A%20%20FROM%20%60EPD_202309%60%0A%20%20WHERE%0A%20%20%20%20BNF_CHEMICAL_SUBSTANCE%20LIKE%20'0501013B%25'%0A%20%20AND%20PRACTICE_NAME%20LIKE%20'LANE%20END%20SURGERY'%0A%20%20GROUP%20BY%0A%20%20%20%20YEAR_MONTH,%0A%20%20%20%20ICB_NAME,%0A%20%20%20%20BNF_CHAPTER_PLUS_CODE,%0A%20%20%20%20CHEMICAL_SUBSTANCE_BNF_DESCR,%0A%20%20%20%20PRACTICE_NAME%0A%20%20"
## [1] "https://opendata.nhsbsa.net/api/3/action/datastore_search_sql?resource_id=EPD_202310&sql=%0A%20%20SELECT%0A%20%20%20%20%20%20YEAR_MONTH,%0A%20%20%20%20%20%20ICB_NAME,%0A%20%20%20%20%20%20PRACTICE_NAME,%0A%20%20%20%20%20%20BNF_CHAPTER_PLUS_CODE,%0A%20%20%20%20%20%20CHEMICAL_SUBSTANCE_BNF_DESCR,%0A%20%20%20%20%20%20COUNT(ITEMS)%20as%20ITEM_COUNT%0A%20%20FROM%20%60EPD_202310%60%0A%20%20WHERE%0A%20%20%20%20BNF_CHEMICAL_SUBSTANCE%20LIKE%20'0501013B%25'%0A%20%20AND%20PRACTICE_NAME%20LIKE%20'LANE%20END%20SURGERY'%0A%20%20GROUP%20BY%0A%20%20%20%20YEAR_MONTH,%0A%20%20%20%20ICB_NAME,%0A%20%20%20%20BNF_CHAPTER_PLUS_CODE,%0A%20%20%20%20CHEMICAL_SUBSTANCE_BNF_DESCR,%0A%20%20%20%20PRACTICE_NAME%0A%20%20"
## [1] "https://opendata.nhsbsa.net/api/3/action/datastore_search_sql?resource_id=EPD_202311&sql=%0A%20%20SELECT%0A%20%20%20%20%20%20YEAR_MONTH,%0A%20%20%20%20%20%20ICB_NAME,%0A%20%20%20%20%20%20PRACTICE_NAME,%0A%20%20%20%20%20%20BNF_CHAPTER_PLUS_CODE,%0A%20%20%20%20%20%20CHEMICAL_SUBSTANCE_BNF_DESCR,%0A%20%20%20%20%20%20COUNT(ITEMS)%20as%20ITEM_COUNT%0A%20%20FROM%20%60EPD_202311%60%0A%20%20WHERE%0A%20%20%20%20BNF_CHEMICAL_SUBSTANCE%20LIKE%20'0501013B%25'%0A%20%20AND%20PRACTICE_NAME%20LIKE%20'LANE%20END%20SURGERY'%0A%20%20GROUP%20BY%0A%20%20%20%20YEAR_MONTH,%0A%20%20%20%20ICB_NAME,%0A%20%20%20%20BNF_CHAPTER_PLUS_CODE,%0A%20%20%20%20CHEMICAL_SUBSTANCE_BNF_DESCR,%0A%20%20%20%20PRACTICE_NAME%0A%20%20"
## [1] "https://opendata.nhsbsa.net/api/3/action/datastore_search_sql?resource_id=EPD_202312&sql=%0A%20%20SELECT%0A%20%20%20%20%20%20YEAR_MONTH,%0A%20%20%20%20%20%20ICB_NAME,%0A%20%20%20%20%20%20PRACTICE_NAME,%0A%20%20%20%20%20%20BNF_CHAPTER_PLUS_CODE,%0A%20%20%20%20%20%20CHEMICAL_SUBSTANCE_BNF_DESCR,%0A%20%20%20%20%20%20COUNT(ITEMS)%20as%20ITEM_COUNT%0A%20%20FROM%20%60EPD_202312%60%0A%20%20WHERE%0A%20%20%20%20BNF_CHEMICAL_SUBSTANCE%20LIKE%20'0501013B%25'%0A%20%20AND%20PRACTICE_NAME%20LIKE%20'LANE%20END%20SURGERY'%0A%20%20GROUP%20BY%0A%20%20%20%20YEAR_MONTH,%0A%20%20%20%20ICB_NAME,%0A%20%20%20%20BNF_CHAPTER_PLUS_CODE,%0A%20%20%20%20CHEMICAL_SUBSTANCE_BNF_DESCR,%0A%20%20%20%20PRACTICE_NAME%0A%20%20"
## [1] "https://opendata.nhsbsa.net/api/3/action/datastore_search_sql?resource_id=EPD_202401&sql=%0A%20%20SELECT%0A%20%20%20%20%20%20YEAR_MONTH,%0A%20%20%20%20%20%20ICB_NAME,%0A%20%20%20%20%20%20PRACTICE_NAME,%0A%20%20%20%20%20%20BNF_CHAPTER_PLUS_CODE,%0A%20%20%20%20%20%20CHEMICAL_SUBSTANCE_BNF_DESCR,%0A%20%20%20%20%20%20COUNT(ITEMS)%20as%20ITEM_COUNT%0A%20%20FROM%20%60EPD_202401%60%0A%20%20WHERE%0A%20%20%20%20BNF_CHEMICAL_SUBSTANCE%20LIKE%20'0501013B%25'%0A%20%20AND%20PRACTICE_NAME%20LIKE%20'LANE%20END%20SURGERY'%0A%20%20GROUP%20BY%0A%20%20%20%20YEAR_MONTH,%0A%20%20%20%20ICB_NAME,%0A%20%20%20%20BNF_CHAPTER_PLUS_CODE,%0A%20%20%20%20CHEMICAL_SUBSTANCE_BNF_DESCR,%0A%20%20%20%20PRACTICE_NAME%0A%20%20"
## [1] "https://opendata.nhsbsa.net/api/3/action/datastore_search_sql?resource_id=EPD_202402&sql=%0A%20%20SELECT%0A%20%20%20%20%20%20YEAR_MONTH,%0A%20%20%20%20%20%20ICB_NAME,%0A%20%20%20%20%20%20PRACTICE_NAME,%0A%20%20%20%20%20%20BNF_CHAPTER_PLUS_CODE,%0A%20%20%20%20%20%20CHEMICAL_SUBSTANCE_BNF_DESCR,%0A%20%20%20%20%20%20COUNT(ITEMS)%20as%20ITEM_COUNT%0A%20%20FROM%20%60EPD_202402%60%0A%20%20WHERE%0A%20%20%20%20BNF_CHEMICAL_SUBSTANCE%20LIKE%20'0501013B%25'%0A%20%20AND%20PRACTICE_NAME%20LIKE%20'LANE%20END%20SURGERY'%0A%20%20GROUP%20BY%0A%20%20%20%20YEAR_MONTH,%0A%20%20%20%20ICB_NAME,%0A%20%20%20%20BNF_CHAPTER_PLUS_CODE,%0A%20%20%20%20CHEMICAL_SUBSTANCE_BNF_DESCR,%0A%20%20%20%20PRACTICE_NAME%0A%20%20"
## [1] "https://opendata.nhsbsa.net/api/3/action/datastore_search_sql?resource_id=EPD_202403&sql=%0A%20%20SELECT%0A%20%20%20%20%20%20YEAR_MONTH,%0A%20%20%20%20%20%20ICB_NAME,%0A%20%20%20%20%20%20PRACTICE_NAME,%0A%20%20%20%20%20%20BNF_CHAPTER_PLUS_CODE,%0A%20%20%20%20%20%20CHEMICAL_SUBSTANCE_BNF_DESCR,%0A%20%20%20%20%20%20COUNT(ITEMS)%20as%20ITEM_COUNT%0A%20%20FROM%20%60EPD_202403%60%0A%20%20WHERE%0A%20%20%20%20BNF_CHEMICAL_SUBSTANCE%20LIKE%20'0501013B%25'%0A%20%20AND%20PRACTICE_NAME%20LIKE%20'LANE%20END%20SURGERY'%0A%20%20GROUP%20BY%0A%20%20%20%20YEAR_MONTH,%0A%20%20%20%20ICB_NAME,%0A%20%20%20%20BNF_CHAPTER_PLUS_CODE,%0A%20%20%20%20CHEMICAL_SUBSTANCE_BNF_DESCR,%0A%20%20%20%20PRACTICE_NAME%0A%20%20"
## [1] "https://opendata.nhsbsa.net/api/3/action/datastore_search_sql?resource_id=EPD_202404&sql=%0A%20%20SELECT%0A%20%20%20%20%20%20YEAR_MONTH,%0A%20%20%20%20%20%20ICB_NAME,%0A%20%20%20%20%20%20PRACTICE_NAME,%0A%20%20%20%20%20%20BNF_CHAPTER_PLUS_CODE,%0A%20%20%20%20%20%20CHEMICAL_SUBSTANCE_BNF_DESCR,%0A%20%20%20%20%20%20COUNT(ITEMS)%20as%20ITEM_COUNT%0A%20%20FROM%20%60EPD_202404%60%0A%20%20WHERE%0A%20%20%20%20BNF_CHEMICAL_SUBSTANCE%20LIKE%20'0501013B%25'%0A%20%20AND%20PRACTICE_NAME%20LIKE%20'LANE%20END%20SURGERY'%0A%20%20GROUP%20BY%0A%20%20%20%20YEAR_MONTH,%0A%20%20%20%20ICB_NAME,%0A%20%20%20%20BNF_CHAPTER_PLUS_CODE,%0A%20%20%20%20CHEMICAL_SUBSTANCE_BNF_DESCR,%0A%20%20%20%20PRACTICE_NAME%0A%20%20"
Let’s revisit API guide in the KCAN page. We will use
package_list to see what other data sets are available in
our Open Data Portal site.
base_endpoint <- "https://opendata.nhsbsa.net/api/3/action/"
package_list_method <- "package_list"
# Combine the base URL and endpoint for listing packages
url_list <- paste0(base_endpoint, package_list_method)
# Perform a request to fetch the results
response_list <- request(url_list) |>
req_perform()
# Check the status of the response, 200 means success
if (resp_status(response_list) == 200) {
# Parse the content of the response directly as JSON using httr2
data_list <- resp_body_json(response_list)
# Extract the list of dataset IDs each FOI is separate ids
dataset_ids <- data_list$result
# Filter the dataset IDs to include only those starting with "foi"
foi_ids <- dataset_ids[grepl("^foi", dataset_ids)]
# change foi_ids a vector to add to data frame
foi_ids <- unlist(foi_ids)
foi_ids_df <- data.frame(ID = foi_ids, stringsAsFactors = FALSE)
View(foi_ids_df)
} else {
cat("Failed to retrieve data: HTTP status", resp_status(response_list), "\n")
}
Can you pick any foi from foi_ids data frame and extract what was their request and response?
# base_endpoint <- "https://opendata.nhsbsa.net/api/3/action/"
# package_show_method <- "package_show?id="
# dataset_id <- "foi-_____"
#
# # Combine the base URL and endpoint
# url <- paste0(base_endpoint, package_show_method, dataset_id)
#
# print(url)
#
# # url should look like this: https://opendata.nhsbsa.net/api/3/action/package_show?id=foi-01040
#
# # Perform a request to fetch the results using httr2
# response <- request(___) |>
# httr2::_____() #httr2 function
#
# # Check the status of the response
# if (resp_status(response) == ___) { # Success status should return this numeric value
# # Parse the content of the response directly as JSON
# data <- httr2::___(___) #httr2 function family of : resp_*
#
# # Extract the notes part from JSON
# notes <- data$result$___
#
# # Find the index of the start of the response part (most case start with \n#Response etc)
# response_start <- regexpr("\n#", notes)
#
# if (response_start[1] != -1) { # if response find
# request_part <- substr(notes, 1, response_start[1] - 1) # Request will be the all the text until Response
# response_part <- substr(notes, response_start[1], nchar(notes)) # All response
# } else {
# request_part <- notes
# response_part <- ""
# }
#
#
# # Extract tags
# tags <- data$result$tags
#
# tag_names <- sapply(tags, function(tag) tag$name)
# concatenated_tags <- paste(tag_names, collapse = ", ")
#
#
#
# # Clean text by removing all characters except alphanumeric, spaces, and URL patterns
# clean_text <- function(text) {
# text <- gsub("(?<!http:|https:)[^[:alnum:] [:space:] /:._-]", "", text, perl = TRUE)
# return(text)
# }
#
# request_part_clean <- clean_text(request_part)
# response_part_clean <- clean_text(response_part)
#
# # Display the cleaned parts & tags
# cat("Tags:\n", concatenated_tags, "\n")
# cat("Request Part:\n", request_part_clean, "\n\n")
# cat("Response Part:\n", response_part_clean, "\n")
# } else {
# cat("Failed to retrieve data: HTTP status", resp_status(response), "\n")
# }
Example answer:
base_endpoint <- "https://opendata.nhsbsa.net/api/3/action/"
package_show_method <- "package_show?id="
dataset_id <- "foi-01960"
# Combine the base URL and endpoint
url <- paste0(base_endpoint, package_show_method, dataset_id)
print(url)
## [1] "https://opendata.nhsbsa.net/api/3/action/package_show?id=foi-01960"
# url should look like this: https://opendata.nhsbsa.net/api/3/action/package_show?id=foi-01040
# Perform a request to fetch the results using httr2
response <- request(url) |>
req_perform() #httr2 function
# Check the status of the response
if (resp_status(response) == 200) { # Success status should return this numeric value
# Parse the content of the response directly as JSON
data <- resp_body_json(response) #httr2 function family of : resp_*
# Extract the notes part from JSON
notes <- data$result$notes
# Find the index of the start of the response part (most case start with \n#Response etc)
response_start <- regexpr("\n#", notes)
if (response_start[1] != -1) { # if response find
request_part <- substr(notes, 1, response_start[1] - 1) # Request will be the all the text until Response
response_part <- substr(notes, response_start[1], nchar(notes)) # All response
} else {
request_part <- notes
response_part <- ""
}
# Extract tags
tags <- data$result$tags
tag_names <- sapply(tags, function(tag) tag$name)
concatenated_tags <- paste(tag_names, collapse = ", ")
# Clean text by removing all characters except alphanumeric, spaces, and URL patterns
clean_text <- function(text) {
text <- gsub("(?<!http:|https:)[^[:alnum:] [:space:] /:._-]", "", text, perl = TRUE)
return(text)
}
request_part_clean <- clean_text(request_part)
response_part_clean <- clean_text(response_part)
# Display the cleaned parts & tags
cat("Tags:\n", concatenated_tags, "\n")
cat("Request Part:\n", request_part_clean, "\n\n")
cat("Response Part:\n", response_part_clean, "\n")
} else {
cat("Failed to retrieve data: HTTP status", resp_status(response), "\n")
}
## Tags:
## HR
## Request Part:
## Thank you for your request for information about the following:
##
## I would like to know:
##
## 1 How many staff the NHS Business Services Authority has.
##
## 2 How many staff in the NHS Business Services Authority have died each year for the last 5 years in RTCs.
##
## 3 Of those I would like to know if they were working at the time commuting to work or off duty.
## It would also be good to know what percentage of the total deaths were in a RTC for the police it was 53.
##
## 4 I would also like to know if you have a Road safety strategy like the MoD Defence Road Safety Strategy - GOV.UK www.gov.uk and if you have guidance on managing fatigue along the lines of what the ORR are currently consulting on for rial staff Managing rail staff fatigue: draft version for consultation orr.gov.uk.
##
## Your request was received on 23 May 2024 and I am dealing with it under the terms of the Freedom of Information Act 2000.
##
##
## Response Part:
##
## Response
##
## 1. 4827
##
## 2. The NHS Business Authority have had no deaths in service caused by Road Traffic Collisions
##
## 3. N/A
##
## 4. The NHS Business Authority does not have a road safety strategy
##
## Please note that this request and our response is published on our Freedom of Information disclosure log at:
##
## https://opendata.nhsbsa.net/dataset/foi-01960
##
## Data Queries
##
## If you have any queries regarding the data provided or if you plan on publishing the data please contact dataandinsightsupportnhsbsa.nhs.uk / foidatanhsbsa.nhs.uk / foirequestsnhsbsa.nhs.uk ensuring you quote the above reference. This is important to ensure that the figures are not misunderstood or misrepresented.
##
## If you plan on producing a press or broadcast story based upon the data please contact communicationsteamnhsbsa.nhs.uk This is important to ensure that the figures are not misunderstood or misrepresented.
##
## The information supplied to you continues to be protected by the Copyright Designs and Patents Act 1988 and is subject to NHSBSA copyright. This information is licenced under the terms of the Open Government Licence detailed at:
## http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
##
## Should you wish to re-use the information you must include the following statement:
## NHSBSA Copyright 2024 This information is licenced under the terms of the Open Government Licence:
##
## http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
##
## Failure to do so is a breach of the terms of the licence.
##
## Information you receive which is not subject to NHSBSA Copyright continues to be protected by the copyright of the person or organisation from which the information originated. Please obtain their permission before reproducing any third party non NHSBSA Copyright information.
Similar to EPD dataset, we can expand the process of extracting all data from FOI datasets. This allows us to tidy the text further and conduct various analyses, such as keyword extraction and other NLP (Natural Language Processing) techniques.
base_endpoint <- "https://opendata.nhsbsa.net/api/3/action/"
package_list_method <- "package_list"
url <- paste0(base_endpoint, package_list_method)
response_list <- request(url) |>
req_perform() # from here we want to extract only foi-xxxxx etc
if (resp_status(response_list) == 200) {
data_list <- resp_body_json(response_list) # pass it as JSON
# Extract the list of dataset IDs each FOI is separate ids
dataset_ids <- data_list$result
# Using grepl to extract id starts with foi
foi_ids <- dataset_ids[grepl("^foi", dataset_ids)]
# Select the top 30 foi (for demonstration, as it takes a while if we run all)
top_foi_ids <- head(foi_ids, 30)
# Create empty data frame which will hold our informations
results_df <- data.frame(ID = character(),
Tags = character(),
Request = character(),
Response = character(),
stringsAsFactors = FALSE)
# Loop through each FOI ID to fetch and process the data
for (dataset_id in top_foi_ids) {
# Combine the base URL and endpoint for each dataset
url <- paste0(base_endpoint, "package_show?id=", dataset_id)
# Perform a request to fetch the results using httr2::req_perform
response <- request(url) |>
httr2::req_perform()
if (resp_status(response) == 200) {
# Parse the content of the response directly as JSON
data <- httr2::resp_body_json(response) # httr2 function family of : resp_*
# Extract the notes part from JSON
notes <- data$result$notes
# Find the index of the start of the response part (most case start with \n#Response etc)
response_start <- regexpr("\n#", notes)
if (response_start[1] != -1) { # if response found
request_part <- substr(notes, 1, response_start[1] - 1) # Request will be all the text until Response
response_part <- substr(notes, response_start[1], nchar(notes)) # All response
} else {
request_part <- notes
response_part <- ""
}
# Extract tags
tags <- data$result$tags
tag_names <- sapply(tags, function(tag) tag$name)
concatenated_tags <- paste(tag_names, collapse = ", ")
# Clean text by removing all characters except alphanumeric, spaces, and URL patterns
clean_text <- function(text) {
text <- gsub("(?<!http:|https:)[^[:alnum:] [:space:] /:._-]", "", text, perl = TRUE)
return(text)
}
request_part_clean <- clean_text(request_part)
response_part_clean <- clean_text(response_part)
# Append the results to the data frame
# each result will be row binding
results_df <- rbind(results_df, data.frame(ID = dataset_id,
Tags = concatenated_tags,
Request = request_part_clean,
Response = response_part_clean,
stringsAsFactors = FALSE))
} else {
cat("Failed to retrieve data for ID:", dataset_id, "- HTTP status", resp_status(response), "\n")
}
}
# View the results data frame
View(results_df)
} else {
cat("Failed to retrieve data: HTTP status", resp_status(response_list), "\n")
}
Plumber is an R package designed to make it easy to create web APIs using R. It allows R users to expose R functions as API endpoints with minimal effort, transforming R scripts into HTTP services. Plumber is particularly useful if you have your analytical models or data processing tasks as web services without needing to learn a new programming language or framework.
Define your API by adding special comments, or “annotations,” to your
R functions. These annotations specify how each function should behave
when it receives an HTTP request. Plumber supports various HTTP methods
(such as GET, POST, PUT, DELETE), making it flexible for different types
of web services. - Plumber requires annotation starts with
#* - For example #* @get defines GET endpoint
like our example code plumber.r
Plumber APIs can handle inputs and outputs in various formats, including JSON, which is commonly used for data exchange on the web. This makes it easy to integrate R with other web-based applications and services.
Once defined, a plumber API can be run locally for testing or deployed to a server for broader access. Deployment can be done on cloud platforms, dedicated servers, or through services like RStudio Connect, which provide additional features such as user authentication, secure access, and easy management of deployed APIs.
Overall, Plumber enables R users to leverage their existing R code and expertise to create powerful, interactive, and shareable web services, facilitating the integration of R’s analytical capabilities into a wide range of applications.
Imagine a regional health authority that wants to fetch antibiotic prescription data for April 2024 (202404) and visualize it in their dashboard to monitor patterns across ICB.
We will use Posit Connect to achieve this. Here’s the process:
Convert Code to Plumber API Script: Take our
initial example code and convert it into a Plumber API script (e.g.,
plumber.R):
r <- plumber::plumb(file = “./2024-06-25-API/plumber.R”) r$run(port = 8000)
Test Locally: Run the API locally to ensure it works correctly.
Publish to Posit Connect: Once satisfied with the local testing, publish the Plumber API script to Posit Connect. This platform will provide a URL for your API.
Share the API URL: Share the provided URL so
others can access your API. For example, the URL might look like this:
https://connect.nhsbsa-analytics.com/antibiotic_data?resource_id=202404
Using this URL, users can access the API and create charts to visualize the data as needed.
In this session, we explored how to work with APIs in R, focusing on accessing data from the NHS Open Data Portal and creating our own APIs using the R Plumber package. We learned how to make HTTP requests, parse JSON responses, and query specific datasets. Additionally, we covered how to use Plumber to expose R functions as API endpoints, facilitating the integration of R’s analytical capabilities into web services. By leveraging these tools, you can enhance your data workflows and create interactive, shareable web services.